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Abstract. Topological landscape is introduced for networks with functions 
defined on the nodes. By extending the notion of gradient flows to the net- 
work setting, critical nodes of different indices are defined. This leads to a 
concise and hierarchical representation of the network. Persistent homology 
from computational topology is used to design efficient algorithms for per- 
forming such analysis. Applications to some examples in social and biological 
networks are demonstrated, which show that critical nodes carry important 
information about structures and dynamics of such networks. 



Networks have become ubiquitous tools for describing structures that occur in 
a variety of fields in the past ten or fifteen years, including biology, social sciences, 
economics and engineering. To study a network, one has to endow it with some 
mathematical structure. The simplest mathematical structure on a network is the 
graph structure. This gives rise to notions such as degrees, paths, connectivity, etc. 
The distinctions between scale- free networks and small- world networks, for example, 
can be studied by examining this structure, see for example [f , 2, 4]. But one can 
endow a network with more sophisticated structures, such as geometric structure 
as in the theory of manifold learning, [5, 6, 7, 8, 9, 10], or topological structure as 
in the theory of persistent homology [11, 12, 13, 14]. These structures allow us to 
probe more deeply into the nature of the network. 

In this paper, we discuss how one can endow a network with a landscape when 
we study a function on the node set. The concept of landscape has been crucial 
in physics and chemistry in describing complex systems, such as energy landscape 
[15]. The introduction of such a concept into complex networks may equip us with a 
concise description of global structures of networks and help explain certain dynam- 
ics such as information diffusion and transition pathways. Many complex networks 
in real world carry flows of information, products, power, etc., which are driven 
by local gradients of a scalar or energy ]1G, 17]. For example traffic flows may be 
driven by congestion function, heat flows are driven by temperature. In biomolec- 
ular folding, conformational changes are driven by the free energy of states. On 
internet, user's attention may be driven by the centrality or significance of websites 
such as PageRank. In these cases, communities or groups emerge as metastable sets 
of gradient-based dynamics or energy basins. Therefore understanding the land- 
scape of such functions will be crucial to disclose associated dynamics in complex 
networks. 

In the core of the landscape lies the notion of critical nodes. In continuous setting 
this meets the classical Morse theory in the study of manifolds [19], where critical 
points can be located by vanishing gradients and their indices can be decided by 
dimensionality of the unstable manifold passing through. However such an approach 
can not be applied to the graph settings as there is no unambiguous definition of 
dimensionality in general. Precisely, consider an undirected graph G = {V, E) with 
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a function defined on the node set ft, : y — !■ IR. The question we will attempt to 
address is: given a function on its nodes, how can we endow the network with a 
landscape, so that one can distinguish critical nodes such as the local minima, local 
maxima, and saddles? 

There are several studies in the literature which may lead to critical nodes for 
graphs by carrying Morse theory to discrete settings. Nevertheless, none of them 
gives a satisfied answer to the question. In computational geometry one may embed 
the graph into a 2D-surface and then apply Morse theory for 2-mamfolds [20]. 
However, such a surface embedding is not natural for general graphs in biological 
and social networks. Another candidate is discrete Morse theory [21], which studies 
functions defined on all faces of cell complexes and is therefore hard to use in 
the graph setting above. A related subject is the extension of the Poincare-Hopf 
theorem to the graph setting, e.g. in [22[. 

In this paper we present a purely combinatorial approach which starts from a 
discrete gradient flow induced by the function on graph nodes. Such an approach 
does not need a surface embedding, and turns out to be closely related to persistent 
homology in computational topology [11, 13] and discrete Morse theory [21] without 
studying functions on high dimensional cells. In particular, given a function (often 
referred to as an energy function) on a network, we will deflne a discrete gradient 
flow associated with that function, as well as minimum energy paths between two 
disjoint sets of nodes. This allows us to deflne critical nodes or saddles. Roughly 
speaking, critical nodes are associated with minimum energy paths between node 
pairs: index-0 critical nodes are simply local minima; index-fc critical nodes are the 
highest energy transition nodes of minimal energy paths connecting index- (fc — 1) 
critical nodes. 

Such a critical node analysis, as we show by examples in social networks and 
biological networks, leads to a concise representation of networks while preserving 
some important structural properties. In short, the local minima or maxima to- 
gether with their attraction basins can be interpreted as communities or groups 
in networks; saddle points act as transition states between different critical points 
of lower indices. In particular, in social networks index-1 saddles act as hubs in 
connecting communities; in biomolecular dynamics, index-1 saddles play roles as 
intermediate or transition states connecting misfolded and native states. In the lat- 
ter, such an analysis does not rely on commonly used Markov state model, whence 
can be applied to much more general data analysis. Moreover, this approach leads 
to a hierarchical classiflcation of nodes in the network and a global visualization of 
networks adaptive to the landscape of given energy function. 

In algorithmic aspect, critical nodes in this paper can be computed at a polyno- 
mial time cost with an algorithm based on computational topology by monitoring 
topological changes over energy evel sets, and in nondegenerate case an almost 
linear algorithm exists which is scalable for the analysis of large scale networks. 

1. Landscape and Critical Nodes 

1.1. Discrete Gradient Flow. Throughout this paper we assume that h is injec- 
tive (one-to-one). Such functions are generic in the space of real functions on V. 
One may associate a gradient flow of h on the graph G, as a map Dhfi : 2^ — >■ 2^ 
which maps a subset of vertices to its immediate neighbors with lower h values. 
More precisely, given x € V, define the neighbor set of x with lower energy 
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M^{x) = {y e M{x) : h{y) < h{x)} and 

ijV-{x), iiAf-{x)^ 



(1) D^,oi{x}) 



{x}, otherwise. 



For any X CV, we define 

(2) Dh,oiX) = U Dh^oiix}). 

xex 

Let Q = D/,,0 o Dii Q, etc. We say that y is reachable from x, denoted by x >- y 
or y ^ X, if ?/ G o({*''}) some fc 6 N, i.e., we can find an energy decreasing 
path from x to y. 

Note that our construction of the gradient flow is related to, but different from 
the gradient network [Ki, 17], in which each node is only connected to its neighbor 
with the lowest energy {i.e. the neighbor in the steepest descent direction). We 
also remark that the gradient flow can be viewed as a "zero temperature" limit of 
the stochastic gradient flow introduced in [ ' ^] in the study of network communities. 

1.2. Local minima. The local minima of h are those vertices whose h value is no 
larger than the values of its neighbors. 

(3) Co = {x\h{x)<h{y),yyeAf{x)}. 

In other words, the set of local minima are precisely the maximal vertex set of fixed 
points of the gradient flow Dh^. 

Given a local minimum x Cz V , its attraction basin is defined to be: 

(4) Ao{x) = {y I DZ,{{y}) = {x}}. 

These are the points that reach the local minimum x but not any other local minima. 

Boundary or separatrix consists of those nodes which can reach more than one 
local minimum following the gradient flow 

(5) Bo = {x\ |i?^o(W)l > 1}. 

It is clear by definition that we have the non-overlapping decomposition 

(6) ^ = ^oU U "^o(^)- 

xeCo 

1.3. Index-1 critical nodes. Our next task is to classify the nodes in Bq. We do 
so according to their role in the pathways connecting the different local minima. 
In particular, index-1 critical nodes (saddles) are defined as the maxima on local 
minimum energy paths connecting different local minima. 

Clearly such a definition relies on the notion of local minimal energy paths, which 
depends on the topology of the path space. Given two local minima, we examine 
all the paths connecting them. If a path 71 can be deformed by the gradient flow 
to another path 72, we say that 71 is deformable to 72. The local minimum energy 
paths are paths which cannot be deformed by the gradient flow. 

To be more precise, given two points a, 6 G V, we define a path from a to 6 as 
7 = (uiQ • • • Wn) such that wq = a, Wn = b, and G 7V(w,;) for i = 0, • • ■ , n — 1. 
We denote the collection of paths from a to & as ^a,b- 

We note the following elementary lemma, whose proof is obvious. 
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Lemma 1. Let x )~ y, we can then find a path 7 = (wq • • • Wn) from x to y such 
that h{wi) > h{wi+i) for i = 0, 1, • • • ,n — 1. 

Given two paths 71,72 6 S^a,h^ we say 71 is deformable to 72, if there is a map 
F : 7i ^ 2'>'2, such that 

• (reaching) every node in 71 reaches some nodes in 72, i.e. for any x G 71, 

F(x) is not empty and for each y G F{x) C 72, y -< 

• (onto) every node in 72 is reachable from 71, i.e. for any y G 72, there 
exists X G 71, so that y G F{x), or equivalently, 

72 = U F{x). 

X671 

Let a, b be two local minima. We call a path 7 G ^a.b local minimum energy 
path, if it is not deformable to any other path in ^a,b- 

We define the energy of a path the maximal energy traversed by the path, i.e. 
^1(7) = maxyg^ h{y). From the definition, if 72 is deformable to 71, we have ^1(72) > 
/i(7i), so in terms of energy barrier. 71 is a more preferable path than 72. 

Given a local minimum energy path, we call the node of maximal energy on the 
path an index-1 critical node. The set of all index- 1 critical nodes is denoted by Ci. 
We will also call local minima index-0 critical nodes, and hence the notation Cq. 

The following fact gives a characterization of index-1 critical nodes. The proof 
can be found in the SL 

Proposition 1 (Classification of index-1 critical nodes). All local minima in Bq 
are index-1 critical nodes. The other index-1 critical nodes will reach one of the 
local minima in Bq by the gradient flow. 

We call the index-1 critical nodes that are also local minima in Bq the nonde- 
generate index-1 critical nodes, the set of which will be denoted as Ci. The other 
index-1 critical nodes are called degenerate. Not every index-1 critical node is a 
local minimum in Bq, for example in some cluster trees (see Figure 1). 

1.4. Higher index critical nodes. The procedure presented above can be ex- 
tended to define higher index critical nodes. 

To define index-2 critical nodes, we consider the subgraph with nodes in Bq and 
edges restricted on this subset, denoted by Gi = (Vi = Bo,Ei). The gradient flow 
Dfi^i : 2'^" — > 2^" on Bq is defined similarly as for Dfi Q. We define the attraction 
basins for a: G Ci as 

(7) Adx):^{yeBo\D^^,{{y}) = {x}}. 

Note that for any nondegenerate index-1 critical node, the attraction basin is 
nonempty. While for a degenerate index-1 critical node, the attraction basin is 
an empty set. This explains the notion "degenerate" for the critical nodes that are 
not local minima in Bq. 

We define the boundary set as 

(8) B, = {xeBo\\D^,i{x})\>l}. 

As shown in Proposition 1, all local minima on Bq are in Ci. Therefore, we have 
the decomposition 

(9) Bo = B,\J \J Ai{x) = Bi[J \J Ai{x). 
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Figure 1. Left: an example of degenerate index-1 critical node, 
where node 5 on top of the tree is a degenerate index-1 saddle 
while nodes 3 is a nongenerate index-1 saddle. Right: an example 
of both degenerate and non-degenerate critical node, where node 
7 on top of the tree is a degenerate index-1 saddle as it lies on 
the minimum energy path connecting local minima 1 (or 2) and 3 
(or 4), and as well a non-degenerate index-2 saddle as it is on the 
minimum energy path linking index-1 saddles 5 and 6. 

Analogously, we define index-2 critical nodes as the maxima on local minimum 
energy paths connecting different nondegenerate index-1 critical nodes. It is clear 
that index-2 critical nodes, if exist, must be in Bi. 

We remark that under our definition, a degenerate index-1 critical node can also 
be an index-2 critical node, as shown in Figure 1. This ambiguity is actually quite 
natural from the network point of view, as these points play multiple roles in the 
structure of the network. The degenerate index-1 critical node can lie either in the 
basin of a nondegenerate critical node or link together two different nondegenerate 
critical nodes. 

Higher index critical nodes can be defined recursively through further decom- 
position of Bi. Classification for high index critical points can be done following 
similar arguments as above. Combining these, we obtain: 

Theorem 1 (Node Decomposition). V admits the following decomposition 

V = Bo[j[j Ao{x) 
xeCo 

where 

Bk-i^Bk[j U Ak{x). 

xeCk 

Here Ak is the attraction basin of local minima restricted on the k — 1-th boundary 
set Bk-i andCk is the set of nondegenerate index-k critical nodes. 

The theorem gives us a hierarchical representation of the network associated 
to the energy landscape. It actually leads to a hypergraph representation whose 
hypernodes are made up of critical nodes with their attraction basins. 
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2. Finding Critical Nodes using Persistent Homology 

2.1. Persistent homology algorithm. 

The landscape introduced above can be naturally formulated in terms of a flood- 
ing procedure, from low to high values of the height function h : V ^ M.. Flooding 
starts from local minima, followed by the attraction basins. Once the relevant 
index-1 saddle is passed, basins of local minima are merged together. This proce- 
dure then continues on to critical points of higher indices. 

More precisely, this procedure can be described in terms of persistent homology. 
Persistent homology, firstly proposed by [11] and developed afterwards largely in 
[12, 13, 14[, is an algebraic tool for computing the Betti numbers and homology 
groups of a simplicial complex when its faces are added sequentially. To work 
with persistent homology, we extend the graph G into a simplicial complex up to 
dimension 2, and also define a filtration which consists of such simplicial complexes, 
in a spirit close to ["_'()[ for PL-manifolds. 

An abstract simplicial complex Sy is a collection of subsets of V, which is closed 
under deletion or inclusion, i.e. if cr G Sy, then r G for any t d a. 

We define the flooding complex of network G associated with the function h, 
^G./i Q 2^ as follows: 

• 0-simplex: the vertex set V; 

• 1-simplex: the vertex pairs {x, y : h{x) < h{y)} that x ^ y, i.e., y £ q{x) 
for some fc; 

• 2-simplex: collections of triangles {x,y,z : h{x) < h{y) < h{z)}, such that 
X ~< y and y ^ z. 

One can similarly extend the definition above to general fc-simplex. However for 
our purpose it suffices to define up to dimension 2 simplices. 

A filtration of fiooding complex Sg,/i is a nested family C Sg./i with J-t~i C 
J-t which respects the order of deletion or inclusion in YiQ.h, i-e. if cr G J^t and r C ct 
then T ^ Tt- 

Assume that /i : F ^ R is injective or one-to-one, which is generically the case. 
By taking the maximum over vertices, one can extend h from the vertex set to 
simplicies, and thus to the simplicial complex Sg./i- For a simplex a G Sg,/i let 
/i(cr) = max{/i(i) : i G cr}. This implies that a face's /i-value is always no more than 
that of its associated simplex, i.e. cr C r ^ ^{'^^ ^ ^(''')- 

A filtration {Tt : t G N) respecting the order of h can be defined in the following 
way: 

(1) -^0 = 0; 

(2) #{(7 G J-t+i\Tt : dim((7) = 0} = 1, i.e. there is precisely one node being 
added into the filtration for each step; 

(3) h(J-'t) < h(J-'t+i), where h[J-t) = max{ft,((T) : a G J^t}, i.e. when a node 
is added into the filtration, all the simplices of the same energy are added 
into the filtration simultaneously. 

Note that under this construction, J-i consists of the global minimum of /. 

In this construction, we consider the filtration corresponding to the flooding 
procedure from low to high h values. The change of Betti numbers identifies the 
index-0 and index-1 critical nodes. Once the filtration is defined, persistent homol- 
ogy computes the Betti numbers of the simplicial complex in Tt for each i G Z, 
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and draws the barcodes of Betti number versus the t or h values, e.g. using JPLEX 
toolbox^. The proof of the foUowing theorem is in the SI. 



Theorem 2. Consider the filtration (J-t). For all t G N, J-t+i\J-t contains an 
index-0 critical node if and only if increases from J-t to J-t+i; J-t+i\J^t contains 
an index-1 critical node if and only if either /3o decreases or /3i increases from J-t 
to Jt+i. 



To find higher index saddles, we restrict on the subgraph Gk = {Vk,Ek) where 
Vk = Bk~i = V\ Uo<i<fc_i Ua;gCi'4i(a;) and Ek consists of edges restricted on 14. 
We can analogously construct the filtration corresponds to the flooding procedure 
iJ'k.t,t G N) on the subgraph Gk- Similar identification holds for higher index 
saddles. 



Theorem 3. Consider the filtration {J-k,t) on subgraph Gk for k > 2. For alltGN 
.such that J-k.t+i\J^k,t contains an index-k critical node if either /3q decreases or j3i 
increases from Tk.t to Tk.t+i- 



Clearly our characterization of high order critical nodes above only exploits sim- 
plicial complex up to dimension 2, whose persistent homology computation is re- 
cently improved to be of complexity 0(m^ '^^^) [24] with m = 0{n^) the total 
number of simplices and n the number of nodes. Such a complexity does not suf- 
fer the curse of dimensionality as the computation of high order Betti numbers in 
general. 



2.2. Efficient Search of Nondegenerate Saddles. 

As we know from Proposition f that nondegenerate critical nodes are actually 
local minimum in sub-graphs G^, this leads to an efficient algorithm for finding 
nondegenerate critical nodes. In fact, all the examples shown in this paper have only 
nondegenerate critical nodes and thus can be found efficiently using this algorithm. 

Given an injective function h on the vertices, we obtain the local mimina and 
nondegenerate index-fc saddles using Algorithm 1. 

The bottleneck in this algorithm is in finding the attraction basins of local min- 
ima, whose complexity can be 0{nd) where n is the number of vertices and d is 
the maximum degree a node has. The total complexity is 0{Knd) where K is the 
maximum index of critical points. The algorithm is much faster than the previous 
algorithm for finding all critical nodes. 



http: / /compt op. Stanford. edu/programs/ 
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Algorithm 1 Fast search of nondegenerate critical nodes 

Sort the nodes according to h in increasing order; 

Set Go = G; 

for k = 0, . . . ,n do 

for X G Vfc in an increasing order of h do 

Find neiglibors of x with lower energy, Af^{x) = {y £ M{x) n 14 | h{y) < h{x)}; 
if M^r{x) = then 

Add x to Ck and set the color of x as its node index; 
else 

if Af^ (x) contains a single color then 
Set the color of x as the single color; 
else 

Leave the color of x as blank; 
end if 
end if 
end for 
return : 

(1) local minima Cfc as nondegenerate critical nodes; 

(2) attraction basins Ak{xo) {xo G Ck) as color components; 

(3) boundary Bk as the blank nodes; 

Set Gfc+i = {Bk,Ek+i) where Ek+i are edges restricted on Bk; 
end for 



3. Examples 

3.1. Zachary's Karate Club Network. Zachary's karate club network [2-S] con- 
sists of 34 nodes, representing 34 members in a karate club with node 1 being the 
instructor and node 34 being the president (Figure 2). An edge between two nodes 
means that the two members join some common activities beyond the normal club 
classes and meetings. Conflicts broke out between the instructor and the president 
when the instructor sought to raise the fee and the president opposed the proposal. 
The club eventually split into two, one formed by the president (blue nodes in Fig- 
ure 2(a)) and another one led by the instructor (red nodes in Figure 2(a)). A lot 
of information about this fission can be disclosed by looking at the graph structure 
of this social network. 

Let di be the degree of node i, and define hi = ~ logd^. To avoid the same degree 
between two nodes in neighbor, a small enough random perturbation is added such 
that hi = hi + €i is injective. Figure 2(b) shows the gradient flow of h. The arrows 
on the edges point from low degree nodes to high degree ones. Note that nodes 24 
and 25 both have degree 3, hence a small random perturbation is added resulting 
in the arrow from 25 to 26. The same is done for nodes 5 and 11. 

Figure 2(c) shows the node decomposition for Karate club network with each 
color component for a critical node and its attraction basin. Two local minima, 
nodes 1 and 34, are in oval shape together with their attraction basins marked 
in red and blue, respectively. Two index-1 saddles, nodes 3 and 32, are yellow 
and green diamond nodes, whose basins are in yellow (nodes 3) and green (node 
32) correspondingly. Node 3 is the lowest energy node connecting the local minima 
nodes 1 and 34 via a minimum energy path 71 = (1, 3, 33, 34). Node 32 links the two 
local minima by another local minimum energy path, 72 = (1, 32, 34). Two index-2 
saddles, nodes 25 (in light blue diamond) and 29 (in cyan diamond), which connect 
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Figure 2. Landscape of Karate Club, (a) The fission of Karate 
Club into two new clubs [2;^], the coach is node 1 and the president 
is node 34, where the box node joined the red club (coach) instead 
of the blue due to his necessity to finish the course, (b) A gradient 
flow on edges, node colors from blue to red indicate the energy from 
low to high, four nodes in diamond shape to-be-disclosed soon as 
critical nodes, (c) Node decomposition with each color component 
representing a critical node with its basin: two local minima are in 
oval shape in which node 1 has basin in red and node 34 in blue; 
two index- 1 saddles are in diamond shape in which node 3 has 
basin in yellow and node 32 in green; two index-2 saddles, node 25 
in light blue and node 29 in cyan, (d) A transition path analysis 
(SI) with source node 1 and target node 34. Committor function 
with thresholding probability 0.5 is used to divide all the nodes 
into two communities, one with node 1 in red and the other with 
node 34 in blue. Node size is in proportion to transition current 
connecting two communities through the node. Effective reactive 
currents from node 1 to node 34 are drawn with arrows on edges, 
whose width is determined from effective reactive current with a 
threshold greater than 0.001. It can be seen that index-1 saddles 
(3, 32) host a majority of transition currents. 
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Figure 3. Barcodes of Betti numbers for the filtration of Karate 
Club network. Top: /3o versus t. Node 34 with the lowest energy is 
added at t = 1 which creates a connected component which never 
disappears. Node 1 with the second smallest energy is added at 
t — 2 which creates a new connected component disappeared when 
index-1 saddle 3 is added at t = 4. Bottom: f3i versus t. The loop 
is created by index-1 saddle 32 added at i = 7 then cancelled by 
index-2 saddle 29 at t = 17. 



two index-1 saddles via two non-deformable minimal energy paths (3, 29, 32) and 
(3,28,25,32). Figure 2(d) further depicts a transition path analysis of a Markov 
chain induced on the graph (see SI) from local minimum node 1 to node 34, which 
shows two index-1 saddles capture most of transition currents. 

Figure 3 shows the barcodes for the flooding complex of this network. 

3.2. The social network of Les Miserables. The social network of Les Mis- 
erables, collected by Knuth [ ], consists of 77 main characters in the novel by Vic- 
tor Hugo. The edge weight Wij record the number of co-occurrence of two characters 
i and j in the same scene. Thus it is a weighted graph where hi = — log » ^ij 
as the negative logarithmic weighted degree. The original network exhibits a single 
local (global) minimum, Valjean, who is the central character as the whole novel 
was written around his experience. 

However, dropping those edges whose weights are no more than a threshold 
value (7 here), there appears a subnetwork which is closely associated with the 
Paris uprising on the 5th and 6th of June 1832, see Figure S-1. The subnetwork 
consists of two local minima, Enjoras and Valjean, the former being the leader of the 
revolutionary students called Friends of the ABC, the Abaisse. Led by Enjolras, 
its other principal members are Courfeyrac, Combeferre, and Laigle (nicknamed 
Bossuet) et al., who fought and died in the insurrection. Among them is an index-1 
saddle, Courfeyrac, a law student and often seen as the heart of the group, who 
introduced Marius to the Friends of ABC. Marius, a descend of the Gillenormands, 
though badly injured in the battle, was saved by the main character Valjean when 
the barricade fell and married to Cosette, the adopted daughter of Valjean. The 
landscape of this subnetwork highlights these events in the novel. 

3.3. LAO Protein Binding Transition Network. This application examines 
the binding of Lysine-, Arginine-, Ornithine-binding (LAO) protein to its ligand, 
recently studied in [26]. The critical node analysis provides us a concise summary 
of global structure of networks while preserving important pathways, which enables 
us to reach a more thorough description than previous approximate analysis. 

In [2()] a Markov state model was constructed with 54 metastable states, using 
data obtained from molecular dynamics simulation. More information about these 
states can be found in SI and [26]. Now we examine the transition network as 
a weighted directed graph G = {y,E,W), where V consists of 54 nodes, each 



THE LANDSCAPE OF COMPLEX NETWORKS 



11 



Boss net 




Figure 4. Landscape of a subnet of The Les Miserables Network. 
Edges are left with weights larger than 7. Two local minima, Val- 
jean and Enjoras as well as an index-1 saddle, Courfeyrac, are 
identified. 




Figure 5. Landscape of LAO protein binding. Local minima are 
represented as ovals, index-1 and index-2 saddles are shown in 
diamonds, circular nodes are regular nodes, and rectangular nodes 
are solvated states {43, 44, . . . , 53}. Color components represent 
the node decomposition. 
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representing a metastable state, an edge 6 if transitions from node i to j 
are observed in simulations with 6 ns delays (the implied time scale for approximate 
Markovian behavior), and the number of transitions is recorded as the weight Wij. 
Eleven of the states ({43, 44, . . . , 54}) are solvated or unbound states. The binding 
state is node 10. 

Let pij = Wij / Wij be the transition probability from state i to state j. This 
defines a Markov chain with a unique stationary distribution tt. We threshold this 

graph to an undirected graph by keeping those edges such that ^ — — > 

30; i.e. average count number is larger than 30. One reason for doing this is that 
small numbers of transitions may be heavily influenced by the noise caused by 
the way of counting the transition. Note that the mean transition count is about 
120, and the qualitative behavior reported below shows certain stability under the 
variation of the threshold value. 

The energy function is h{i) — — log7r(i) where tt is the stationary distribution 
of metastable states. Application of the method above gives rise to a landscape 
shown in Figure 5. Isolated states are dropped in this picture. Colors in this 
picture illustrate the node decomposition according to Theorem 1 , where each color 
component represents the attraction basin of a critical node. Below we shall discuss 
structural properties of these nodes. A complete picture of structural information 
for all 54 states can be found in SI. 

There are two major local minima in the landscape, nodes 10 and 18. Node 
10 is the bound state which is the minimum in the most populated energy basin. 
Its attraction basin is colored in light blue. Nodes 11 (population 13.5%) and 5 
(population 1.15%) are two encounter complexes in that basin. In these states, the 
ligand is in or close to the binding site and conformations in this state have a small 
twist but large opening angles. The other local minimum is node 18, a misbound 
state, where the ligand interacts with the protein outside the binding site and close 
to the hinge region of two domains of the protein. State 18, together with state 4, 
8, 9, and 20, forms a misbound basin marked in red. In these states, the ligand 
interacts with the protein from a distance to the binding site. State 8 and 9 exhibit 
similar structural properties with a negative twisting angle and a fixed distance to 
the binding site (about 10), while state 4, 18, and 20 exhibit similar but a different 
type of structures. 

Node 19 and 14 are two index-1 saddles connecting the two basins associated 
with local minima. They are metastable intermediate states between misbound 
and bound states. But in these saddles ligand interacts with the protein in differ- 
ent ways. In state 19 (population 32%), the ligand is interacting with the protein 
from one twisting direction (positive) and the protein is quite closed. In a con- 
trast, in state 14 (population 35%) the ligand is approaching the protein from the 
opposite twisting direction (negative) and the protein is still quite open (see SI). 
These two saddles actually play different roles in reactive pathways which will be 
discussed below. Node 2 is an index-2 saddle, which is essentially a high energy 
misbound state. Note that high index saddles are unstable with respect to different 
thresholding values. In the following we shall focus on index-1 saddles. 

For a quantitative analysis on the roles of index-1 saddles, we conduct two kinds 
of transition path analysis using transition path theory ([27] or see SI). First, we 
study reactive currents from the misbound state 18 to the bound state 10. This 
analysis shows that a majority of flux passes through the saddle 19. Therefore once 
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the ligand and protein fall in the misbound state 18, the major pathway to escape 
and enter the bound state is via saddle 19. 

The other analysis, as was also did in [2()], studies transition paths from the 
eleven solvated states marked from 43 to 53 to the bound state 10. In particular, 
we investigate reactive currents from each of the solvated states to the bounded 
state, respectively. The results are summarized as follows. A large part of these 
details has been ignored in [_'()], since they only examined 10 transition pathways, 
ignoring the others. 

(1) Solvated state 52 lies in the basin of bound state 10, whence misbound state 
18 has little influence on its pathway. 

(2) Solvated state 53 only passes through index-1 critical node 19 to enter the 
bound state 10, which is heavily influenced by the misbound state 18. 

(3) Solvated states {45,46,47} lie in the basin of index-1 critical node 14 and 
enter the bound state 10 directly or via 14. They are not much influenced 
by the misbound state 18. 

(4) Other solvated states are in the basin of index-2 critical node 2. Transition 
path analysis further shows that misbound state 18 has a stronger influence 
on them than those in the basin of 14. In particular state 50 is mostly 
influenced with near 50% of transition currents trapped by the misbound 
state 18. 

In summary, the misbound state 18 affects some of the pathways from solvated 
states to the bound state. Index-1 critical node 14 is a state where ligand starts 
to interact with protein to enter the encounter complex 11. If we can design some 
mutations to disrupt the stability of this state or even encounter complexes, we may 
be able to make the binding much more difficult. Finally we note that the critical 
node analysis here does not rely on the Markov model assumption and can thus 
be applied to the analysis of transition networks in molecular dynamics beyond its 
Markovian time scale. 

4. Discussion and Conclusion 

We have introduced a notion of critical points for network which can be used to 
reduce a complex network to a coarse-grained representation while preserving struc- 
tural properties associated with functional gradient flows. Examples have shown 
that the information obtained this way is of great value in capturing global struc- 
ture and dynamics of the network, such as diffusive or reactive pathways. Moreover, 
the critical point analysis leads to a hierarchical decomposition which may enable 
us to perform multiscale analysis of complex networks. These perspectives will be 
systematically pursued in the future. 

An interesting question is the stability of these objects against noise. To answer 
this question, one has to clarify the source of noise. There are two types of noise one 
should consider in landscape analysis of networks - one associated with the energy 
function h and the other associated with the network structure. The former can 
be dealt with traditional persistent homology denoising, where critical nodes with 
shallow basins can be merged with their saddles. The latter is however more chal- 
lenging as there are no systematic studies yet on perturbation or bootstrapping of 
networks. In the examples above, we used edge thresholding on the Les Miserables 
and the protein binding networks, which is equivalent to modeling such networks 
as a superposition of a signal graph and some Erdos-Renyi type random graphs 
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as noise. However there raight be better models which lead to different denoising 
rules. 

Acknowledgements 

W.E. acknowledges supports from ARO grant W911NF-07-1-0637 and ONR 
grant N00014-01-1-0674. J.L. is grateful to Eric Vanden-Eijnden for helpful dis- 
cussions. Y.Y. thanks Xuhui Huang for providing Figure S-2 in supporting in- 
formation with helpful discussions, as well as supports from the National Basic 
Research Program of China (973 Program 2011CB809105), NSFC (61071157), Mi- 
crosoft Research Asia, and a professorship in the Hundred Talents Program at 
Peking University. 

Supplementary Information 
S-1. Proofs 

Proof of Proposition 1. We show first that every local minimum in Bq must be an 
index-1 critical node. Let x be a local minimum in Sq. Then x reaches at least two 
local minima, say yi,y2 € Cq. Consider the subgraph with node set 

S = {{x} U A{yi) U A{y2)) n {y \ h{y) < h{x)}. 

Clearly, S is connected and x is the unique maximum node in S. By the definition 
of the attraction basin, the set S'\{a;} is not connected. 

Since S is connected, it contains at least a path from yi to y2 ■ Let 7 be the local 
minimal energy path from yi to j/2 in the subgraph S. As S'\{a;} is not connected, 
7 must pass x, so that ft. (7) = h(x). 

We now show by contradiction that 7 is also a local minimal energy path in the 
original graph V . Suppose we can find another path from yi to 2/2, called 7, so that 
7 is deformable to 7. For any z G 7, we have h{z) < h{x). Consider the set 7nBo, 
which is^non-empty. We distinguish two cases: ^ 

a) 7nSo = {a;}. Then, 7\{a;} C A{yi)yjA{y2), so that 7 C 5. By construction 
of 7, we have 7 = 7; 

b) If there exists z G 7nZ5o and z ^ x^'we have some point a;' G 7 that z -< x' . 
It is easy to see that x' must be x, since other points on 7 are in attraction 
basins of y\ and 2/2- Using Lemma 1, there exists a path 71 = (wq ■ ■ ■ Wn) 
from z to x ordered in energy increase. In particular, consider the point 
Wn-i, we have z -< Wn-i so that i(j„_i G Sq. Moreover, Wn-i G M{x) and 
h(wn-i) < h{x). This contradicts with the fact that a; is a local minimizer 
in Bo- 

Therefore, 7 is a local minimal energy path, and x is an index-1 critical node. 

Let z G Ci which is not a local minimum in Bq- Then, z must reach a local 
minimum x in Bq by the gradient flow. By the first part of the proposition, a; G Ci. 
The proposition is proved. □ 

Proof of Theorem 2. (Necessity). We first show that index-0 and index-1 critical 
nodes, when added into the filtration, will change Betti numbers in the way above. 

For index-0 critical nodes, they are local minima of graph G. When a local 
minima is added into the filtration, it must create a new connected component 
which increases the 0-th Betti number, f3o- 

Index-1 saddles will play a more complicated role. We have two situations 
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• if an index- 1 saddle lies on top of a global minimal energy path, it will 
decrease /3o upon being added; 

• if an index- 1 saddle lies on top of a local minimal energy path other than 
the global one, it will increase /Si upon being added. 

Given a pair of index-0 critical nodes 2/1,2/2 G Cg, among all local minimal energy 
paths connecting them (if exist), there must be a global minimal energy path 70, 
so that /i(7o) is less than any other local minimal energy paths between j/i and 2/2- 
We denote the maximal node of the global minimal energy path as x. Such x is an 
index- 1 critical node. When x is added into the filtration, the 0-th Betti number 
/3o will decrease as x connects two components contains 2/1 and 2/2 respectively. 

For the other local minimal energy paths connecting 2/1 and 2/2, the associated 
index- 1 critical nodes will increase the first Betti number /3i when added into the 
filtration. Indeed, let z be such an index-1 critical node. Thus z is a maximum 
of a local minimum energy path 71 such that /i(7i) = h{z) > h(x) = /i(7o). 7i is 
not deformable to the global minimal energy path 70 between 2/1 and 2/2 • Then two 
paths 7o and 71 forms a loop, and hence the first Betti number /3i increases when 
z is added into the filtration. 

(Sufficiency). We show next that no other nodes when added into the filtration 
will change the first two Betti numbers in the same way. 

For any node x which lies in the attraction basin of a local minima Ao{xo) 
for some xq ^ x, x reaches xq by gradient flow. For any edge {x,x'} G E with 
x' e ^o(a;o)) x' reaches xq and thus the triangle {x,x',xo} is included in the 
simplicial complex. This implies that Ao{xo) is contractible (star-shape), whence 
no node in ^0(2^0) other than local minimum a;o will change Betti numbers. 

It remains to show that any node in boundary i3o\Ci will not change Betti 
numbers in the same way. Any such node z £ Bq must reach at least two local 
minima, say a and b. Then by Lemma f there is a path 7 = (a — wq, . . . , z = 
Wk,---,b ~ wi) for some I G N such that h{ws) < h{ws+i) for s < k — 1 and 
h{'Ws) > h{ws+i) for s > k. Moreover z ^ Ci implies that 7 is deformable to a local 
minimal energy path tt = (a = vq, . . . ,b = «,„) between the same end nodes, for 
some TO G N. z can not decreases number of connected components as the path tt, 
which appears first in the filtration, already connects a and b. 

Now we show that the path 7 will not create a loop either. Let tt^ = c G Ci 
be the maximal node on tt. We must have c ^ z. To see this, as 7 is deformable 
to TT, there is a node c' — Wk' G 7 which reaches c G tt. We may assume c' ^ z 
ik' ^ k) since otherwise we are done. Then, by the construction of the path 7, we 
have c' ~< z, and hence c -< z. 

Note that both z and c reach both local minima a and b, node Wi with i < k 
{i > k) reaches a {b, respectively), and node Vi with i < t {i > t) reaches a {b, 
respectively). These will create a set of triangles such that 7 is homotopy equivalent 
to TT, i.e. loop-free. □ 

Proof of Theorem 3. The proof is analogous to that of Theorem 2. □ 

S-2. Current on edges and paths. Transition path theory 

The energy landscape gives us a global picture for the different attraction basins 
on the network. To understand the dynamics between the different basins, the 
transition path theory (TPT) provides a natural tool. 
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The transition path theory was originally introduced in the context of continuous- 
time Markov process on continuous state space [27] and discrete state space [28], 
see [2!)[ for a review. Another description of discrete transition path theory for 
molecular dynamics can be also found in [.!!i[. Here we adapt the theory to the 
setting of discrete time Markov chain with transition probability matrix P. We 
assume reversibility in the following presentation, the extension to non-reversible 
Markov chain is straightforward. 

Given two sets A and B in the state space V, the transition path theory tells 
how these transitions between the two sets happen (mechanism, rates, etc.). If we 
view A as a, reactant state and _B as a product state, then one transition from A 
to _B is a reaction event. The reactve trajectories are those part of the equilibrium 
trajectory that the system is going from A to B. To make the notion more precise, 
define the ordered family of times {n^, n^} such that 

X,^AeA, ,X,^BeB, 

J J 

XneV\iAUB), yn,nf<n<nf. 
Hence, a reaction happens from time to time . 

Definition 1. Given any equilibrium trajectory {X„}, we call each portion of the 
trajectory of between and a AB-reactive trajectory. We call the time during 
which the reaction occurs the reactive times 

(S-10) R=\J{nf,nf). 

The central object in transition path theory is the committor function. Its value 
at X gives the probability that a trajectory starting from x will hit the set B first 
than A, i.e., the success rate of the transition at x. Given two sets A and B in the 
state space, q satisfies the equation 

q{x) =0, X&A- 
q{x) = l, xeB, 

The committor function provides natural decomposition of the graph. If q{x) is 
less than 0.5, x is more likely to reach A first than B; so that {x | q{x) < 0.5} gives 
the set of points that are more attached to set A. 

Once the committor function is given, the statistical properties of the reaction 
trajectories between A and B can be quantified. We state several propositions 
characterizing transition mechanism from A to B. The proof of them is an easy 
adaptation of [27, 28[ and will be omitted. 

Proposition 2 (Probability distribution of reactive trajectories). The probability 
distribution of reactive trajectories 

(S-12) 7:R{x)=F{X,,=x,neR) 

is given by 



(S-13) t^r{x) = TT{x)q{x){l - q{x)). 
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The distribution ttr gives the equilibrium probabihty that a reactive trajectory 
visits X. It provides information about the proportion of time the reactive trajec- 
tories spend in state x along the way from A to B. 

Proposition 3 (Reactive current from A to B). The reactive current from A to 
B, defined by 

(S-14) J{xy) = P(X„ = X, X„+i = y, {n, 71 + 1} C R), 

is given by 

n{x)q{x)Pa:y{l - q{y)), x ^ y; 
0, otherwise. 



(S-15) Jixy) 



The reactive current J{xy) gives the average rate the reactive trajectories jump 
from state x to y. From the reactive current, we may define the effective reactive 
current on an edge and transition current through a node which characterizes the 
importance of an edge and a node in the transition from A to _B, respectively. 

Definition 2. The effective current of an edge xy is defined as 
(S-16) J'^ixy) = max{ J{xy) - J{yx), 0). 

The transition current through a node x \& defined as 

(S-17) T{x)^l Y.:,^yJ+{xy), xeB 

I E,ey^+(^2/)=E.ev^+(^y), x^AUB 

In applications one often examines partial transition current through a node 
connecting two communities V~ = {x : q{x) < 0.5} and = {x : q{x) > 0.5}, 
e.g. J^{xy) for x £ V~ , which shows relative importance of the node in 

bridging communities. 

The reaction rate defined as the number of transitions from Ato B happened 
in a unit time interval, can be obtained from adding up the probability current 
flowing out of the reactant state. This is stated by the next proposition. 

Proposition 4 (Reaction rate). The reaction rate is given by 

(s-18) v= J2 J^'^y)^ E "^(^y)- 

x^A,y^V x£V,y£B 

Finally, the committor functions also give information about the time proportion 
that an equilibrium trajectory comes from A (the trajectory hits A last rather than 
B). 

Proposition 5. The proportion of time that the trajectory comes from A ( resp. from 
B) is given by 

(S-19) / = ^^(x)g(x), = J27r{x){l-q{x)). 

x&V x^V 
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Figure S- 1 . The Les Miserables Network. The whole network has 
77 nodes as main characters in Victor Hugo's novel, Les Miserables, 
where Valjean is the only local minimum (global minimum) as the 
novel is written around his experience. 



S-3. Supplementary Figures 

The first figure is the whole co-appearance network of 77 main characters in the 
novel, Les Miserables, by Victor Hugo [2^)]. It is an undirected weighted graph with 
edge weights as the number of co-appearances for a pair of characters. Without 
thresholding this network contains one local minimum, Valjean. However a thresh- 
olding with edge weight greater than 7 gives rise to the subnetwork in the main 
text. 

The second figure contains a list of structural information on 54 metastable 
states. It contains a typical crystal structure in each state, and some free energy 
plots on certain reaction coordinates. From these pictures one can read various 
structural properties of critical nodes in LAO-protein binding transition network 
discussed in the main text. More information about this system can be found in 

The third figure shows the ranking of transition currents out of misbound state 
18 over eleven transition pathways. The experiment selects each of the eleven 
solvated states {43, . . . , 53} as the source set and the misbound state 10 as the 
common target set. In each of the eleven experiments, relative transition current 
out of state 18 divided by total transition current from the source, is recorded and 
plotted in a descending order. 
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Figure S-2. (Courtesy by Xuhui Huang) Three types of pictures 
for each of the 54 metastable states: on the left is the crystal struc- 
ture of a representative conformation in each state, on the right are 
free energy plots of the protein opening angle versus twisting angle 
(O, T) (red), as well as the distance between the ligand and the 
binding site versus the opening angle (L,0) (blue). The green and 
blue crosses correspond to X-ray structures of the bound (PDB ID: 
ILAF) and apo (PDB ID: 2LA0) conformations respectively. 
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Percentage of transition currents out of state 18 
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Figure S-3. Transition Currents out of misbound state 18, with 
source set from each of solvated states {43, . . . , 53} and target set 
as bound state 10. 
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